Search CORE

531 research outputs found

Laplace's rule of succession in information geometry

Author: E Takimoto
L Tierney
N Cesa-Bianchi
PD Grünwald
R Krichevsky
S-I Amari
S-I Amari
Publication venue
Publication date: 14/03/2015
Field of study

Laplace's "add-one" rule of succession modifies the observed frequencies in a sequence of heads and tails by adding one to the observed counts. This improves prediction by avoiding zero probabilities and corresponds to a uniform Bayesian prior on the parameter. The canonical Jeffreys prior corresponds to the "add-one-half" rule. We prove that, for exponential families of distributions, such Bayesian predictors can be approximated by taking the average of the maximum likelihood predictor and the \emph{sequential normalized maximum likelihood} predictor from information theory. Thus in this case it is possible to approximate Bayesian predictors without the cost of integrating or sampling in parameter space

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Bandit Online Optimization Over the Permutahedron

Author: D. Suehiro
D.P. Helmbold
J. Yellott
L.G. Valiant
M. Jerrum
N. Cesa-Bianchi
P. Auer
S. Beggs
S. Yasutake
Publication venue
Publication date: 01/01/2014
Field of study

The permutahedron is the convex polytope with vertex set consisting of the vectors

(\pi(1),\dots, \pi(n))

for all permutations (bijections)

\pi

over

\{1,\dots, n\}

. We study a bandit game in which, at each step

t

, an adversary chooses a hidden weight weight vector

s_t

, a player chooses a vertex

\pi_t

of the permutahedron and suffers an observed loss of

\sum_{i=1}^n \pi(i) s_t(i)

. A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of

O(n\sqrt{T \log n})

for a time horizon of

T

. Unfortunately, CombBand requires at each step an

n

-by-

n

matrix permanent approximation to within improved accuracy as

T

grows, resulting in a total running time that is super linear in

T

, making it impractical for large time horizons. We provide an algorithm of regret

O(n^{3/2}\sqrt{T})

with total time complexity

O(n^3T)

. The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters

arXiv.org e-Print Archive

Crossref

Automatic categorization of patent applications using classifier combinations

Author: C.J. Fall
G. Giacinto
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Litlestone
R.R. Yager
S. Chakrabarti
Y. Yang
Publication venue
Publication date: 01/01/2006
Field of study

Crossref

VBN

Statistical Mechanics of Linear and Nonlinear Time-Domain Ensemble Learning

Author: Cesa-Bianchi N.
Freund Y.
Freund Y.
Hara K.
Inoue J. I.
Krogh A.
Miyoshi S.
Miyoshi S.
Miyoshi S.
Miyoshi S.
Nishimori H.
Saad D.
Urbanczik R.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 22/09/2006
Field of study

Conventional ensemble learning combines students in the space domain. In this paper, however, we combine students in the time domain and call it time-domain ensemble learning. We analyze, compare, and discuss the generalization performances regarding time-domain ensemble learning of both a linear model and a nonlinear model. Analyzing in the framework of online learning using a statistical mechanical method, we show the qualitatively different behaviors between the two models. In a linear model, the dynamical behaviors of the generalization error are monotonic. We analytically show that time-domain ensemble learning is twice as effective as conventional ensemble learning. Furthermore, the generalization error of a nonlinear model features nonmonotonic dynamical behaviors when the learning rate is small. We numerically show that the generalization performance can be improved remarkably by using this phenomenon and the divergence of students in the time domain.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Crossref

IR ion spectroscopy in a combined approach with MS/MS and IM-MS to discriminate epimeric anthocyanin glycosides (cyanidin 3-O-glucoside and -galactoside)

Author: Botta B.
Cesa S.
Chiavarino B.
Corinti D.
Crestoni M. E.
Fornarini S.
Ingallina C.
Maccelli A.
Mannina L.
Quaglio D.
Tintaru A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Anthocyanins are widespread in plants and flowers, being responsible for their different colouring. Two representative members of this family have been selected, cyanidin 3-O-β-glucopyranoside and 3-O-β-galactopyranoside, and probed by mass spectrometry based methods, testing their performance in discriminating between the two epimers. The native anthocyanins, delivered into the gas phase by electrospray ionization, display a comparable drift time in ion mobility mass spectrometry (IM-MS) and a common fragment, corresponding to loss of the sugar moiety, in their collision induced dissociation (CID) pattern. However, the IR multiple photon dissociation (IRMPD) spectra in the fingerprint range show a feature particularly evident in the case of the glucoside. This signature is used to identify the presence of cyanidin 3-O-β-glucopyranoside in a natural extract of pomegranate. In an effort to increase any differentiation between the two epimers, aluminum complexes were prepared and sampled for elemental composition by FT-ICR-MS. CID experiments now display an extensive fragmentation pattern, showing few product ions peculiar to each species. More noteworthy is the IRMPD behavior in the OH stretching range showing significant differences in the spectra of the two epimers. DFT calculations allow to interpret the observed distinct bands due to a varied network of hydrogen bonding and relative conformer stability

HAL AMU

Archivio della ricerca- Università di Roma La Sapienza

Byzantine Stochastic Gradient Descent

Author: Alistarh Dan-Adrian
Allen-Zhu Zeyuan
Bengio S.
Cesa-Bianchi N.
Garnett R.
Grauman K.
Larochelle H.
Li Jerry
Wallach H.
Publication venue
Publication date: 01/01/2018
Field of study

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the

m

machines which allegedly compute stochastic gradients every iteration, an

\alpha

-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds

\varepsilon

-approximate minimizers of convex functions in

T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big)

iterations. In contrast, traditional mini-batch SGD needs

T = O\big( \frac{1}{\varepsilon^2 m} \big)

iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

On the Prior Sensitivity of Thompson Sampling

Author: BC May
D Russo
D Russo
E Kaufmann
J Bartroff
N Cesa-Bianchi
P Auer
S Bubeck
SL Scott
TL Lai
W Thompson
Publication venue
Publication date: 20/07/2016
Field of study

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with

p

being the prior probability mass of the true reward-generating model, we prove

O(\sqrt{T/p})

and

O(\sqrt{(1-p)T})

regret upper bounds for the bad- and good-prior cases, respectively, as well as \emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning Theory (ALT), 201

arXiv.org e-Print Archive

Crossref

Papa Giovanni XXIII Bergamo Hospital at the time of the COVID-19 outbreak : Letter from the warfront…

Author: F. Di Marco
F. Fabretti
F.L. Lorini
M. Rizzi
S. Buoro
S. Cesa
S. Fagiuoli
Publication venue: 'Wiley'
Publication date: 01/06/2020
Field of study

AIR Universita degli studi di Milano

Perfil tecnológico de cultivo de trigo em lavouras tecnicamente assistidas no Paraná - safra 2012.

Author: BODNAR A.
CESA P.
DOSSA A.
FOLONI J. S. S.
HARGER N.
MORI C. de
Publication venue
Publication date: 18/08/2014
Field of study

Repository Open Access to Scientific Information from Embrapa